Single trial detection in encephalography

ABSTRACT

An EEG cap ( 8 ) having 64 or 128 electrodes ( 10 ) is placed on the head of the subject ( 11 ) who is viewing CRT monitor ( 14 ). The signals on each channel are amplified by amplifier ( 17 ) and sent to an analog-to-digital converter ( 20 ). PC ( 23 ) captures and records the amplified signals and the signals are processed by signal processing PC ( 26 ) performing linear signal processing. The resulting signal is sent back to a feedback/display PC ( 29 ) having monitor ( 14 ). An EEG cap ( 8 ) having 64 or 128 electrodes ( 10 ) is placed on the head of the subject ( 11 ) who is viewing CRT monitor ( 14 ). The signals on each channel are amplified by amplifier ( 17 ) and sent to an analog-to-digital converter ( 20 ). PC ( 23 ) captures and records the amplified signals and the signals are processed by signal processing PC ( 26 ) performing linear signal processing. The resulting signal is sent back to a feedback/display PC ( 29 ) having monitor ( 14 ).

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No. 12/898,508, filed Oct. 5, 2010, which is a continuation of U.S. patent application Ser. No. 10/966,290, filed on Oct. 15, 2004 and issued as U.S. Pat. No. 7,835,787, which is a continuation of PCT/US03/13943, filed on May 5, 2003, which claims priority to U.S. Provisional Application No. 60/377,833, filed on May 3, 2002, the contents of which are hereby incorporated by reference in their entirety and from which priority is claimed.

NOTICE OF GOVERNMENT RIGHTS

The U.S. Government has certain rights in this invention pursuant to the terms of Defense Advanced Research Project Agency (DARPA) contract N00014-010C-0482 and the Department of Defense Multidisciplinary University Research Initiative (MURI) program administered by the Office of Naval Research under Grant N00014-01-1-0625.

BACKGROUND OF INVENTION

The performance of a brain computer interface (BCI) can be optimized by considering the simultaneous adaptation of both a human and machine learner. Preferably, adaptation of both learners occur on-line and in (near) real-time. The human and machine learners are assumed to process data sequentially, with the human learner gating the response of the machine learner. The gating by the human learner captures the dynamic switching between task-dependent strategies, while the machine learner constructs the mappings between brain signals and control signal for a given strategy (or set of strategies). The human and machine co-learn in that they adapt simultaneously to minimize an error metric, or equivalently, maximize a bit rate.

In a typical BCI system, signal acquisition from a human learner, or subject, is typically through one or more modalities (electroencephalography (EEG), magnetoencephalography (MEG), chronic electrode arrays, etc.). A key element of a BCI system is a machine learning or pattern recognition module to interpret the measured brain activity and map it to a set of control signals or, equivalently, a representation for communication, e.g., a visual display.

In addition to the machine learner, the human learner is integral to a BCI system. Adaptation of the human learner is often implicit, for example humans will switch strategies (e.g. think left/right versus up/down) based on their perceived performance. This dynamic switching by the human learner can make adaptation of the machine learner challenging, particularly since this can be viewed as making the input to the machine learner more non-stationary. Since the overall challenge in BCI is to maximize performance of the combined human-machine system (i.e., minimize error rate or conversely maximize bit rate) an approach is required which jointly optimizes the two learners.

Conventional analysis of brain activity using BEG and MEG sensors often relies on averaging over multiple trials to extract statistically relevant differences between two or more experimental conditions. Trial averaging is often used in brain imaging to mitigate low signal-to-interference (SIR) ratios. For example, it is the basis for analysis of event-related potentials (ERPs) as explained in Coles M. G. H. et al., “Event-related brain potentials: An introduction,” Electrophysiology of Mind. Oxford: Oxford University Press (1995). However, for some encephalographic applications, such as seizure prediction, trial averaging is problematic. One application where the problem of single-trial averaging is immediately apparent is the brain computer interface (BCI), i.e., interpreting brain activity for real-time communication. In the simplest case, where one wishes to communicate a binary decision, averaging corresponds to asking the same question over multiple trials and averaging the subject's binary responses. In order to obtain high-bandwidth communication, it is desirable to do as little averaging over time or across trials as possible.

More generally, single-trial analysis of brain activity is important in order to uncover the origin of response variability, for instance, in analysis of error-related negativity (ERN). The ERN is a negative deflection in the EEG following perceived incorrect Psychological Science, 4(6):385-390 (1993); Falkenstein, M. et al., “ERP components on reaction errors and their functional significance: A tutorial,” Biological Psychology, 51:87-107, (2000) or expected losses (Gehring, W. J. et al., “The medical frontal cortex and the rapid processing of monetary gains and loss,” Science, 295: 2279-2282 (2002)) in a forced-choice task. Single-trial detection of the ERN has been proposed as a means of correcting communication errors in a BCI system (Schalk et al., “BEG-based communication: presence of an error potential,” Clinical Neurophysiology, 111:2138-2144, (2000)). With the ability to analyze the precise timing and amplitude of the ERN, on individual trials, one can begin to study parameters that cannot be controlled across trial, such as reaction time or enor perception. Such an approach opens up new possibilities for studying the behavioral relevance and neurological origin of the ERN.

With the large number of sensors on a single subject in high-density EEG and magnetoencephalography (MEG), e.g., 32 or more sensors, an alternative approach to trial averaging is to integrate information over space rather than across trials. A number of methods along these lines have been proposed. Blind source separation analyzes the multivariate statistics of the sensor data to identify spatial linear combinations that are statistically independent over time (Makeig et al., “Independent component analysis of electroencephalographic data,” Advances in Neural Information Processing Systems, 8: 145-151, MIT Press (1996); Vigario et al., “Independent component approach to the analysis of BEG and MEG recordings,” IEEE Transactions on Biomedical Engineering, 47(5): 589-593 (2000); Tang et al., “Localization of Independent Components of Magnetoencephalography in Cognitive Tasks,” Neural Computation, Neural Comput. 14(8): 1827-1858 (2002)). Separating independent signals and removing noise sources and artifacts increases SIR. However, blind source separation does not exploit the timing information of external events that is often available. In most current experimental paradigms subjects are prompted with external stimuli to which they are asked to respond. The timing of the stimuli, as well as the timing of overt responses, is therefore available, but is generally not exploited by the analysis method.

In the context of a BCI system, many methods have applied linear and nonlinear classification to a set of features extracted from the BEG. For example, adaptive autoregressive models have been used to extract features across a limited number of electrodes, with features combined using either linear or nonlinear classifiers to identify the activity from the time course of individual sensors (Pfurtscheller, G. et al., “Motor imagery and direct brain-computer communication,” Proceedings of the IEEE, 89(7):1123-1134, (2001)). Others have proposed to combine sensors in space by computing maximum and minimum eigenvalues of the sensor covariance matrices. The eigenvalues, which capture the power variations of synchronization and desynchronization, are then combined nonlinearly to obtain binary classification (Ramoser et al., “Optimal spatial filtering of single trial EEG during imagined hand movement,” IEEE Transaction on Rehabilitation Engineering, 8(4):441-446 (2000)). Spatial filtering has also been used to improve the signal-to-noise ratio (SNR) of oscillatory activity. However, there has been no systematic effort to choose optimal spatial filters. In the context of the ERN, Gehring et al. (1993) use linear discrimination to identify characteristic time courses in individual electrodes, but do not exploit spatial information. Although many of these aforementioned methods obtain promising performance in terms of classifying covert (purely mental) processes, their neurological interpretation remains obscured.

SUMMARY OF INVENTION

It is therefore an object of this invention to provide a system and method which will maximize performance of a BCI.

It is a further object of this invention to provide a system and method which will yield good single trial discrimination in a relatively short period of time.

These and other objects are accomplished by use of conventional linear discrimination to compute the optimal spatial integration of a large array of brain activity sensors. This allows exploitation of timing information by discriminating and averaging within a short time window relative to a given external event. Linear integration permits the computation of spatial distributions of the discriminating component activity, which in turn can be compared to functional neuroanatomy to evaluate the validity of the result. The term “component” instead of “source” is preferred so as to avoid confusion with an implied physiological source.

BRIEF DESCRIPTION OF THE DRAWINGS

Further objects, features, and advantages of the present invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings showing illustrative embodiments of the present invention, in which:

FIG. 1 is a block diagram of an exemplary embodiment of the system of the present invention;

FIG. 2 shows the results of a test using the present invention which predicts explicit (overt) motor response using 122 MEG sensors. In particular, FIG. 2A shows trials averages and standard deviations of discriminating components for left and right button pushes; FIG. 2B shows sensor projections for the discrimination vector; FIG. 2C shows the receiver operating characteristic (ROC) curve for left vs. right discrimination; and FIG. 2D shows the dipole-fit of the sensor projection overlaid on an MRI image;

FIG. 3 shows the results of a test using the present invention which classifies imagined (covert) motor activity using 59 EEG sensors. In particular, FIG. 3A shows trials averages and standard deviations of discriminating components for left and right imagined taps; FIG. 3B shows a dorsal view of sensor projections; FIG. 3C shows the receiver operating characteristic (ROC) curve for left vs. right discrimination; and FIG. 3D shows the sensor projection of discriminating component for explicit finger taps; and FIG. 3E shows the ROC curve for the same subject for an explicit finger tap; and

FIG. 4 shows the results of a test using the present invention which detects decision errors for a binary discrimination task using 64 EEG sensors. In particular, FIG. 4A shows trial averages and standard deviations of the discriminating component for correct and error trials; FIG. 4B shows the dorsal view of sensor projections; and FIG. 4C shows the ROC curve for error versus correct trials.

DETAILED DESCRIPTION OF THE INVENTION

An exemplary embodiment of the system of the present invention is shown in FIG. 1. An BEG cap 8 is placed on the head of a subject 11 who is viewing CRT monitor 14. The cap, such as available from Electro-Cap, Inc., may have any number of Ag/AgCl electrodes 10, with 64 or 128 electrodes, and a like number of corresponding output channels, being preferred. The signals on each channel are amplified by amplifier 17 and sent to analog-to-digital converter 20. From there, data acquisition PC 23 captures and records the amplified signals using commercially-available data acquisition software, such as Cogniscan™, available from Cognitronics, Inc., which can also provide lowpass/bandpass filtering. The signals are then processed by signal processing PC 26 which performs the linear signal processing described below. Appendix A provides exemplary software in accordance with the present invention for the signal processing PC 26. The resulting signal is sent to a feedback/display PC 29 having monitor 14. PCs 23, 26, 29 may each be a Dell 530, 2.4 GHz, or equivalent model. It will be understood that the functions of any two or all of PCs 23, 26, 29 may be combined into a single computer.

The signal processing performed by signal processing PC 26 may be broken into two types—linear discrimination and localization of discriminating components.

Linear Discrimination

As described below, a logistic regression model is used to learn an optimal linear discriminator using the spatial distribution of BEG activity across a high-density sensor array. Denoting x(t) as the M sensor values sampled at time instance t, spatial weighting coefficients v are computed such that

y(t)=v ^(T) x(t)  (1)

is maximally discriminating between the times t, corresponding to two different experimental conditions. For example, in the prediction of explicit motor response experiments (an example of which is described below in Example I) the times correspond to a number of samples prior to an overt button push. The samples corresponding to a left button push are to be discriminated from samples of a right button push. For each of N trials there may be T samples totaling NT training examples. Conventional logistic regression (Duda et al., Pattern Classification, John Wiley & Sons, 2nd Edition, (2001), incorporated herein by reference) is used to find v. A number of other linear classifiers were tested, including support vector machines (SVM) and perceptron, (id.), as well as Gaussian classifiers, and all had essentially the same performance. After finding the optimal v we average over the T dependent samples of the kth trial to obtain a more robust result, y _(k)=_(T) ¹ Σ_(tet) _(k) (y(t), where T_(k) denotes the set of sample times corresponding to trial k. Receiver operating characteristic (ROC) analysis (Swets, “Analysis applied to the evaluation of medical imaging techniques,” Investigative Radiology 14:109-121, (1979)) is done using these single-trial short-time averaged discrimination activities (yk). For visualization purposes, it is also useful to compute the trial averaged discrimination activities

$\begin{matrix} {{\overset{\_}{y}}_{e} = {\frac{1}{N}{\sum\limits_{k \in N_{C}}{y_{k}(t)}}}} & (2) \end{matrix}$

where N_(e) denotes the set of samples for event e (e.g. left or right button push) with time measured relative to some common reference across trials. The separation of the means together with their corresponding variances gives an indication of whether single-trial discrimination is plausible within the analysis window.

Localization of Discriminating Components

In order to provide a functional neuroanatomical interpretation of the resultant spatial weighting, a forward linear model is used to determine “sensor projections” of the discriminating component activity. In this model, y(t) is treated as a source which is maximally discriminating given the linear model and task. A simple way of visualizing the origin of a source's activity is to display the coupling coefficients of the source with the sensors. The strength of the coupling roughly indicates the closeness of the source to the sensor as well as its orientation. The coupling a is defined as the coefficients that multiply the putative source y(t) to give its additive contribution x_(y)(t) to the sensor readings, x_(y)(t)=ay(t). However, x_(y)(t) is not observable in isolation; instead we observe, x(t)=x_(y)(t)+x_(y)(t), where x_(y) ¹(t) represents the activity that is not due to the discriminating component. If the contributions, x_(y) ¹(t), of other sources are uncorrelated with y(t) we obtain the coupling coefficients by the least-squares solution (Haykin, Adaptive Filter Theory, Englewood Cliffs, N.J., Prentice-Hall, (1996)). Arranging the samples x(t) for different t as columns in the a matrix X, and y(t) as a column vector y the solution is given by

$\begin{matrix} {a = \frac{x_{y}}{y^{T}y}} & (3) \end{matrix}$

In general other sources are not guaranteed to be uncorrelated with the discriminating component. Therefore a represents the coupling of all component activities that are correlated to the discriminating component y(t). We refer to a as a “sensor projection,” as it measures the activity in the sensors that correlate with a given component. Our approach relies on the linearity of y(t) and the fact that different sources in BEG and MEG add linearly (Baillet, S. et al., “Electromagnetic brain mapping.” IEEE Signal Processing Magazine, 18(6): 14-30, 2001).

Sensor projection a was derived as follows. Assuming the observation vector is x, a linear classifier, y₁=v^(T)x, can be built where y₁ is the binary number indicating some cognitive event that we are trying to detect. A number of such cognitive events occurring simultaneously is assumed. These are represented as a vector of binary indicators y, with y₁ as its first element, and a matrix A that maps these to the observation vectors; i.e., x=Ay. Without restriction y is normalized to be zero mean. We wish to identify this mapping, namely to find the first column of A, which we call a and which is defined as the observation vector that would be obtained if only y₁ occurred. The most likely a can be found as follows. Let X be the zero mean observation matrix for many samples, i.e., the t^(th) column is the observation for the t^(th) sample. Let y₁ ^(T) be the corresponding binary column vectors across these samples given by y¹=v^(T)X. The definition for a implies X=ay₁. The maximum likelihood estimate for a, given v and X, is given by the least-squares solution, a=Xy₁ ^(T)(y₁y₁ ^(T))⁻¹ We would like to determine the conditions under which the least-squares estimate of a is actually proportional to the first column of A. Let the matrix Y be the binary matrix of the simultaneous cognitive events across trials, i.e., the t^(th) column is the cognitive events vector y for the t^(th) trial. Since X=AY, we find that a=AYy₁ ^(T)(y₁y₁ ^(T))⁻¹. Note that Y has dimensions of number of cognitive events (N) by number of samples (T), and that the quantity Yy₁ ^(T) is the column vector of unnormalized correlations between the event indicators y₁ and the set of all cognitive events. If this is proportional to the Kronecker delta, δ_(t,1) (i.e., y₁ is uncorrelated with the indicators of the other events), then a_(t)∞Σ_(j)A_(i,j)δ_(i,1)=A_(t,1), and therefore a is proportional to the first column of A.

EXAMPLES Example I Predicting Explicit (Overt) Motor Response Using MEG

Four subjects performed a visual-motor integration task. A “trump” experiment was defined whereby subjects were simultaneously presented with two visual stimuli on a CRT, one of which is the target and “trumps” (beats-out) the other. Subjects were instructed to push a left hand or right hand button, depending on which side the target (trump stimulus) was present. The subject was to discover the target by trial and error using auditory feedback. Each trial began with visual stimulus onset, followed by button push, followed by auditory feedback, indicating if the subject responded correctly. The interval between the motor-response and the next stimulus presentation was 3.0+/−0.5 sec. Each subject performed 90 trials, which took approximately 10 minutes. MEG data was recorded using 122 sensor at a sampling rate of 300 Hz and high-pass filtered to remove DC drifts. Dipole fits were done using the “xfit” tools available from Neuromag (www.neuromag.com), which assume a spherical head model to find a single equivalent current dipole.

Example II Classifying Imagined Imagined (Covert) Motor Activity Using EEG

Nine subjects performed a visual stimulus driven finger (L/R) tapping task. Subjects were asked to synchronize an explicit or imagined tap by the left, right, or both index fingers to the presentation of a brief temporally predictable signal. Subjects were trained until their explicit taps occurred consistently within 100 ms of the synchronization signal. Subjects were presented visual stimuli indicating with which index finger to tap and if it should be an explicit or imagined tap. 1.25 seconds after the last instruction symbol a fixation point was replaced for 50 ms by the letter “X.” This letter served as a signal to which the instructed tap (whether overt or imagined) was to be synchronized. Each trial lasted for 6 s. After training, each subject received 10 blocks of trials. Each 72-trial block consisted of nine replications of the eight trial types (Explicit vs. Imagined×Left vs. Right vs. Both vs. No Tap) presented in a random order. Trials with noise due to eye blinks were not considered in the EEG analysis. The electromyogram (EMG) was recorded to detect muscle activity during imagined movements. The 59 EEG channels were sampled at 100 Hz and high-pass filtered to remove DC components.

Example III Detection of Decision Errors from EEG

Seven subjects performed a visual target detection amongst distractors task. On each trial, subjects were presented with a stimulus for 100 ms. There were four possible stimuli, each consisting of a row of five arrows. Subjects were told to respond by pressing a key on the side indicated by the center arrow. They were to ignore the four flanking arrows. On half of the trials, the flanking arrows pointed in the same direction as the target (e.g. <<<<<), on the other half the Hankers pointed in the opposite direction (e.g. <<><<). Subjects were slower and made many more errors in the latter case. Following their response, there was an inter-trial interval of 1.5 seconds, after which a new stimulus was presented. Subjects performed 12 blocks of 68 trials each. The 100 ms interval prior to the response was used as the baseline period (separately for each trial and electrode). The sampling rate was 250 Hz. Following the baseline period, trials were manually edited to remove those with blinks, large eye movements, instrument artifacts and amplifier saturation.

Results

Single trial discrimination results are shown for Examples I-III and include trial averaged discriminating component activity y _(e)(t), sensor projections a, and detection/prediction performance using single-trial, short-time averaged y _(k). Performance is reported using ROC analysis computed with a leave-one-out training and testing procedure (Duda et al. (2001). ROC analysis is a reasonable method for quantifying performance for these three data sets, since it enables one to incorporate an independent cost for false positives and false negatives. For example, in an error correction application using the ERN, it is important to detect error events with high confidence. The desired operating point of such a detector is therefore at a low false- positive rate (high specificity). In contrast, an application which looks to exploit motor imagery for communicating a binary decision is best assessed at the operating point where sensitivity equals specificity; i.e. the error rates for the two possible outcomes are equal. A metric that quantifies the overall performance of a detector for arbitrary operating points is the area under the ROC curve (A_(z)). Below we report A_(z) as well as the fraction of correct classification for all three tasks. A summary of the results for the three examples is given in the following table, where mean and standard deviation (SD) are reported across N subjects; N_(e) is the number of trials used to determine the best linear classifier (no. positive/no. negative trials):

ROC area (A_(z)) Fraction correct Detection mean ± SD Mean ± SD N N_(e) Sensors time window Explicit L/R 0.82 ± 0.06 0.79 ± 0.09 4 45/45 122 MEG  100 ms to 33 button push ms prior to prediction button push Imagined L/R 0.77 ± 0.10 0.71 ± 0.08 9 90/90 59 EEG 400 ms finger tap before to 400 discrimination ms after synchronization Response 0.79 ± 0.05 0.73 ± 0.05 7 40-80/300  64 EEG 0 ms to 100 error/correct ms after discrimination response

As seen in Table 1, for all three data sets the number of trials for training is comparable to the number of coefficients to be trained. This can lead to serious problems in overtraining. We mitigate these by including multiple training samples for each trial. These samples are obviously not independent; however, they provide evidence for the natural variation of the data and thus make the estimates much more robust. They were shown, through cross-validation, to improve estimated generalization performance. We would expect that increasing the number of independent training samples (e.g., trials) would similarly increase performance of the results presented below.

FIG. 2 shows results for the Example I data set used to predict whether a subject will press a button with their left or right hand by analyzing the MEG signals in a window prior to the button push (left hand=1, right hand=0 in the logistic regression model). We use an analysis window 100 ms wide centered at 83 ms prior to the button event, which at 300 Hz corresponds to T=30 . FIG. 2 shows the results for one subject. (A) shows trial averages y _(e)(t) (solid curves 42, 45) and standard deviations (dotted curves) of discriminating component for left and right button pushes, curves 42 and 45, respectively. Time is indicated in seconds. The vertical line at t=0 s indicates the timing of the button push. The vertical lines earlier in time mark the discrimination window. One can see significant separation of the means for left vs. right button push within the analysis window. Given that this separation is approximately equal to one standard deviation, this suggests that single trial discrimination is possible. (B) shows the sensor projections for the discrimination vector. Area 48 shows the highest activity. (C) shows the ROC curve for left vs. right discrimination. The area under the curve A_(z) is 0.93, indicating good single-trial discriminability. (D) shows the dipole-fit of a overlaid on an MRI image. A single equivalent current dipole fits the data with an accuracy of 64% using the least squares ‘xfit’ routine from Neuromag. This compares favorably with the 50% goodness of fit which are typically obtained for somatosensory responses when using all 122 sensors (Tang 2002). When considered with respect to the motor-sensory homunculus, these results indicate that the discrimination source activity originates in the sensory-motor cortex corresponding to the left hand.

FIG. 3 shows results for the Example II data set, where the goal is to detect activity associated with purely imagined motor response, a situation more realistic for a BCI system. Subjects are trained to imagine a tap with the left or right index finger synchronized to a brief, temporally predictable signal. Therefore there exists a known time window in which we can explicitly look for activity that discriminates between the left and right imagine conditions. A 0.8 s time window around the time where the task is to be performed was selected. 90 left and 90 right trials were available to train the coefficients of the 59 EEG sensors. The results for the best performing subject is A_(z)=0.90. (A) shows trial averages y _(e)(t) (solid curves 51, 54) and standard deviations (dotted curves) of discriminating component for left and right imagined taps, curves 51 and 54, respectively. The vertical solid line at t=0 seconds indicates the timing of the visual stimulus that defines the action to be performed (left or right imagine). The subjects are trained to execute the task at around t=1.25 s. The vertical lines after t=0 indicate the discrimination window. (B) shows a dorsal view of sensor projections a. Area 57 has the highest activity. (C) shows the ROC curve for left vs. right discrimination. For this subject the fraction of correct classification is p=0.79 which corresponds to an information transfer of 0.26 bits/trial. (D) shows sensor projection of discriminating component for explicit finger tap. Areas 60 have the highest activity. (E) shows the ROC curve for the same subject for an explicit finger tap.

The sensor projection of the 59 EEG sensors shows a clear left-right polarization over the motor area. In the context of BCI the metric of interest is the hit rate of at which information can be transmitted with imagined motor activity. The information transmitted per trial is given by,

I=1+plog₂(p)+(1−p)log₂(1−p),  (4)

where p is the fraction correct. As noted above, for the subject shown in FIG. 4 this corresponds to I=0.26 bits. When averaged over the nine subjects the information transmitted is I=0.16 bit/trial. Note that with a repetition rate of 6 seconds this experiment is not designed for an optimal transmission rate. Assuming instead a repetition rate of 0.8 seconds (corresponding to the time window used here for discrimination) we obtain an average bit rate of 12 bits/minute.

For comparison, an alternative method, first described by Wolpaw et al. (1991), that is based on differences in the power spectrum in electrodes over the left and right motor cortex was also tested. Andersen et al., (“Multivariate Autoregressive Models for Classification of Spontaneous Electroencephalogram During Mental Tasks,” IEEE Transactions on Biomedical Engineering, 45(3):277-286, (1998)) modifies the approach by using six auto-regressive (AR) coefficients to model the power spectrum of each electrode within the analysis window and classify the imagined conditions using a linear discrimination on these AR coefficients. Following Penny et al. (2000), we used electrodes C3 and C4 (international 10/20 electrode placement system—see Towle et al., “The spatial location of EEG electrodes: locating the best-fitting sphere relative to cortical anatomy,” Electroencephalogr Clin. Neurophysiol., 86(1): 1-6, (1993)) and obtain A_(z)=0.65±0.09, and fraction correct of p=0.62±0.07, which corresponds to I=0.054 bits/trial or a bit rate of 4 bit/minute. This is about a fourth of the results obtained with our proposed method.

The results, across the nine subjects, for predicting explicit finger taps from a window 300 ms to 100 ms prior to the taps is A_(z)=0.87±0.08 and a fraction correct of 0.80±0.08. As shown in FIG. 3, for one subject sensor projections of the discrimination vector for explicit motor response are similar to the projections of the imagined motor response. This is consistent with previous finding in EEG and fMRI (Cunnington et al., “Movement-related potentials associated with movement preparation and motor imagery,” Exp. Brain Res., 111(3):429-36, (1996); Porro et al., “Primary motor and sensory cortex activation during motor performance and motor imagery: a functional magnetic resonance imaging study,” J Neurosci., 16(23): 7688-98 (1996)) and supports the approach of many current BCI systems-signals arising from the cortical areas that encode an explicit movement are also in some sense optimal for detecting the imagined movement.

FIG. 4 shows the results for the target detection experiments where the goal is to detect the Error Related Negativity (ERN) on a single trial basis. The ERN has a medial-frontal distribution that is symmetric to the midline, suggesting a source in the anterior cingulate (Dehaene et al., “Localization of a neural system for error detection and compensation,” Psychological Science, 5: 303-305, (1994)). It begins around the time of the perceived incorrect response and lasts roughly 100 ms thereafter. We use this time window for detection. 40 to 80 error trials and 300 correct trials were used for training and testing 64 coefficients. (A) shows trial averages e(t) (solid curves 63, 67) and standard deviations (dotted curves) of the discriminating component for correct and error trials, curves 63 and 67, respectively. The negative deflection after a button push response at t=0 s is the ERN. Vertical lines at t=0 and t=100 ms indicate the discrimination window. (B) shows the dorsal view of sensor projections a. Area 70 has the highest activity. (C) shows the ROC curve for error vs. correct trials. The solid curve corresponds to discrimination using Eq. (1), and dotted line to discrimination with center electrode (FCz). The sensor projection shown in FIG. 4(B) for one subject is representative of the results obtained for other subjects and is consistent with the scalp topography and time course of the ERN. The detection performance for this subject was A_(z)=0.84 and is to be compared to A_(z)=0.63 when detecting ERN from the front-center electrode where maximal activity is expected (FCz in the 10/20 system).

The results of Examples I-III demonstrate the utility of linear analysis methods for discriminating between different events in single-trial, stimulus driven experimental paradigms using BEG and MEG. An important aspect of our approach is that linearity enables the computation of sensor projections for the optimally discriminating weighting. This localization can be compared to the functional neuroanatomy, serving as a validation of the data driven linear methods. In all three examples, the activity distribution correlated with the source that optimizes single-trial discrimination localizes to a region that is consistent with the functional neuranatomy. This is important, for instance in order to determine whether the discrimination model is capturing information directly related to the underlying task-dependent cortical activity, or is instead exploiting an indirect cortical response or other physiological signals correlated with the task (e.g. correlations with the stimulus, eye movements, etc.). Localization of the discriminating component activity and its correlates also enables one to determine the neuranatomical correlations between different discrimination tasks, as was demonstrated for explicit and imagined motor responses in EEG.

While this invention has been described with reference to several illustrative examples and embodiments, they should not be interpreted as limiting the scope or spirit of the invention. In actual practice many modifications may be made by those of ordinary skill in the art without deviating from the scope of the invention as expressed in the appended claims. For example, the system and method of the present invention may be applied to other encephalographic modalities with linear superposition of activity, such as functional infrared imaging (Boas et al., “Imaging the body with diffuse optical tomography.” IEEE Signal Processing Magazine, 18(6):57-75, (2001)). 

We claim:
 1. A method for interpreting sampled brain activity signals of a subject subjected to at least a first stimulus and a second stimulus, comprising: (a) identifying, using a processing arrangement, one or more sets of samples from the sampled brain activity signals corresponding to a first response to the first stimulus; (b) identifying one or more sets of samples corresponding to a second response to the second stimulus; and (c) determining two or more spatial weighting coefficients such that a linear response maximally discriminates between the one or more sets of samples corresponding to the first response and the one or more sets of samples corresponding to the second response.
 2. The method of claim 1, wherein the sampled brain activity signals comprise two or more sensor signals generated by two or more corresponding members of an array of brain activity sensors in proximity to a head of the subject.
 3. The method of claim 2, further comprising amplifying each of the two or more sensor signals.
 4. The method of claim 3, further comprising converting each of the two or more sensor signals using an analog-to-digital converter.
 5. The method of claim 2, wherein determining two or more spatial coefficients comprises applying a linear classifier to each of the two or more sensor signals.
 6. The method of claim 1, wherein the first response comprises a motor response of the subject.
 7. The method of claim 1, wherein the first response comprises a right button push and a second response comprises a left button push.
 8. The method of claim 1, wherein the one or more samples corresponding to the first response comprise: a first sample corresponding to a first instance of the first response; and a second sample corresponding to a second instance of the first response.
 9. The method of claim 1, wherein the one or more samples corresponding to the first response comprise, for each sample, a predetermined number of samples prior to the first response.
 10. The method of claim 1, wherein the one or more samples corresponding to the first response comprise, for each sample, a predetermined number of samples following the first stimulus.
 11. The method of claim 1, wherein determining two or more spatial weighting coefficients comprises using a linear discrimination method.
 12. The method of claim 1, further comprising: obtaining a set of samples following a stimuli; weighting the set of samples using the two or more spatial weighting coefficients; and generating an output signal based on the weighted set of samples having a high correlation to one of the one or more sets of samples corresponding to the first response and the one or more sets of samples corresponding to the second response.
 13. The method of claim 12, wherein generating an output signal comprises linearly integrating the weighted set of samples.
 14. A system for interpreting sampled brain activity signals of a subject subjected to at least a first stimulus and a second stimulus, comprising: an input adapted to receive sampled brain activity signals; and a data processor, coupled to the input, programmed to: receive the sampled brain activity signals from the input; identify one or more sets of samples from the sampled brain activity signals corresponding to a first response to the first stimulus; identify one or more sets of samples corresponding to a second response to the second stimulus; and determine two or more spatial weighting coefficients such that a linear response maximally discriminates between the one or more sets of samples corresponding to the first response and the one or more sets of samples corresponding to the second response.
 15. The system of claim 14, further comprising an array of brain activity sensors coupled to the input.
 16. The system of claim 15, wherein the array of brain activity sensors are positioned on a cap.
 17. The system of claim 16, wherein the cap comprises an EEG cap.
 18. The system of claim 15, wherein the array of brain sensors comprises a plurality of silver chloride electrodes.
 19. The system of claim 14, further comprising an amplifier coupled to the input.
 20. The system of claim 14, further comprising an analog-to-digital converter coupled to the input. 